Skip to main content

DuckDB - Retrieving MOT test data

·2720 words·13 mins

Using the Open Data Anonymised MOT tests and results to try out DuckDB working with ’large’ data locally.

First step is to download all the compressed files from the Open Data website. A few minutes later and there’s over 25GB of files. Here’s the resulting files…

-rw-rw-r-- 1 simon simon  451M Jan 14 15:36 dft_test_item_2017.zip
-rw-rw-r-- 1 simon simon  366M Jan 14 15:34 dft_test_item_2018.zip
-rw-rw-r-- 1 simon simon  394M Jan 14 15:34 dft_test_item_2019.zip
-rw-rw-r-- 1 simon simon  367M Jan 14 15:34 dft_test_item_2020.zip
-rw-rw-r-- 1 simon simon  602M Jan 14 15:38 dft_test_item_2021.zip
-rw-rw-r-- 1 simon simon  416M Jan 14 15:34 dft_test_item_2022.zip
-rw-rw-r-- 1 simon simon  439M Jan 14 15:22 dft_test_item_2023.zip
-rw-rw-r-- 1 simon simon  1.1G Jan 14 15:46 dft_test_result_2017.zip
-rw-rw-r-- 1 simon simon  1.1G Jan 14 15:46 dft_test_result_2018.zip
-rw-rw-r-- 1 simon simon  1.1G Jan 14 15:46 dft_test_result_2019.zip
-rw-rw-r-- 1 simon simon  1.1G Jan 14 15:46 dft_test_result_2020.zip
-rw-rw-r-- 1 simon simon  1.2G Jan 14 15:46 dft_test_result_2021.zip
-rw-rw-r-- 1 simon simon  1.1G Jan 14 15:46 dft_test_result_2022.zip
-rw-rw-r-- 1 simon simon  1.2G Jan  8 15:09 dft_test_result_2023.zip
-rw-rw-r-- 1 simon simon  249K Mar  8 11:00 lookup.zip
-rw-rw-r-- 1 simon simon   47M Jan 14 15:23 test_item_2005.txt.gz
-rw-rw-r-- 1 simon simon  218M Jan 14 15:30 test_item_2006.txt.gz
-rw-rw-r-- 1 simon simon  252M Jan 14 15:31 test_item_2007.txt.gz
-rw-rw-r-- 1 simon simon  283M Jan 14 15:32 test_item_2008.txt.gz
-rw-rw-r-- 1 simon simon  309M Jan 14 15:33 test_item_2009.txt.gz
-rw-rw-r-- 1 simon simon  313M Jan 14 15:33 test_item_2010.txt.gz
-rw-rw-r-- 1 simon simon  323M Jan 14 15:33 test_item_2011.txt.gz
-rw-rw-r-- 1 simon simon  335M Jan 14 15:33 test_item_2012.txt.gz
-rw-rw-r-- 1 simon simon  348M Jan 14 15:33 test_item_2013.txt.gz
-rw-rw-r-- 1 simon simon  347M Jan 14 15:33 test_item_2014.txt.gz
-rw-rw-r-- 1 simon simon  332M Jan 14 15:33 test_item_2015.txt.gz
-rw-rw-r-- 1 simon simon  333M Jan 14 15:33 test_item_2016.txt.gz
-rw-rw-r-- 1 simon simon  205M Jan 14 15:29 test_result_2005.txt.gz
-rw-rw-r-- 1 simon simon  874M Jan 14 15:44 test_result_2006.txt.gz
-rw-rw-r-- 1 simon simon  917M Jan 14 15:45 test_result_2007.txt.gz
-rw-rw-r-- 1 simon simon  937M Jan 14 15:45 test_result_2008.txt.gz
-rw-rw-r-- 1 simon simon  957M Jan 14 15:45 test_result_2009.txt.gz
-rw-rw-r-- 1 simon simon  972M Jan 14 15:46 test_result_2010.txt.gz
-rw-rw-r-- 1 simon simon  992M Jan 14 15:46 test_result_2011.txt.gz
-rw-rw-r-- 1 simon simon  993M Jan 14 15:46 test_result_2012.txt.gz
-rw-rw-r-- 1 simon simon 1009M Jan 14 15:46 test_result_2013.txt.gz
-rw-rw-r-- 1 simon simon 1016M Jan 14 15:46 test_result_2014.txt.gz
-rw-rw-r-- 1 simon simon  1.0G Jan 14 15:46 test_result_2015.txt.gz
-rw-rw-r-- 1 simon simon  1.1G Jan 14 15:46 test_result_2016.txt.gz

Next extracting each of the files and checking the data. Initially focus on just the test results, will return to the failure items later.

2023 #

simon@NUC:~/Documents/mot_data$ unzip dft_test_result_2023.zip 
Archive:  dft_test_result_2023.zip
  inflating: test_result.csv         
simon@NUC:~/Documents/mot_data$ mv test_result.csv test_result_2023.csv
simon@NUC:~/Documents/mot_data$ ls -l test_result_2023.csv 
-rw-rw-r-- 1 simon simon 3661104239 Feb  5  2024 test_result_2023.csv
simon@NUC:~/Documents/mot_data$ wc -l test_result_2023.csv 
42216722 test_result_2023.csv
simon@NUC:~/Documents/mot_data$ head test_result_2023.csv 
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
1994821045|838565361|2023-01-02|4|NT|P|179357|NW|TOYOTA|PRIUS +|WHITE|HY|1798|2016-06-17
358005195|484499974|2023-01-01|4|NT|P|300072|B|TOYOTA|PRIUS|RED|HY|1500|2008-09-13
773392437|53988366|2023-01-02|4|NT|PRS|307888|HA|TOYOTA|PRIUS|GREY|HY|1497|2010-01-15
133665147|606755010|2023-01-02|4|NT|F|65810|SE|TOYOTA|PRIUS|SILVER|HY|1497|2007-03-28
656743571|606755010|2023-01-02|4|RT|P|65810|SE|TOYOTA|PRIUS|SILVER|HY|1497|2007-03-28
607277335|1307416223|2023-01-02|4|NT|P|211242|NW|TOYOTA|PRIUS|PURPLE|HY|1790|2016-02-01
1779040733|984166795|2023-01-02|4|NT|P|150344|UB|TOYOTA|PRIUS|WHITE|HY|1797|2019-12-20
234737553|21541545|2023-01-02|4|NT|P|28649|HA|TOYOTA|PRIUS|BLUE|HY|1797|2020-07-01
1125876917|1074624265|2023-01-02|4|NT|P|98679|E|TOYOTA|PRIUS|RED|HY|1798|2016-11-18

That’s 3.6GB of csv with 42 million lines, and a quick look at the first few rows and the data looks good, now to repeat for 2022.

2022 #

simon@NUC:~/Documents/mot_data$ unzip dft_test_result_2022.zip 
Archive:  dft_test_result_2022.zip
  inflating: test_result_2022.csv    
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2022.csv 
-rw-rw-r-- 1 simon simon 3.4G Dec  8  2023 test_result_2022.csv
simon@NUC:~/Documents/mot_data$ wc -l test_result_2022.csv 
41632879 test_result_2022.csv
simon@NUC:~/Documents/mot_data$ head test_result_2022.csv 
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
334683447|634775234|2022-01-01|4|NT|P|227219|E|TOYOTA|PRIUS|SILVER|HY|1497|2008-01-17
586095521|1220215709|2022-01-01|4|NT|P|136552|CR|TOYOTA|PRIUS|GREY|HY|1798|2013-11-29
960974211|1315791989|2022-01-01|4|NT|F|129847|E|TOYOTA|PRIUS|WHITE|HY|1798|2018-01-01
1041792341|1144451355|2022-01-01|4|NT|P|123133|TW|TOYOTA|PRIUS|SILVER|HY|1496|2016-11-21
1587264975|1315791989|2022-01-01|4|RT|P|129848|E|TOYOTA|PRIUS|WHITE|HY|1798|2018-01-01
1032834657|1310098304|2022-01-01|4|NT|PRS|238117|IG|TOYOTA|PRIUS|BLACK|HY|1798|2012-06-29
51919479|483214935|2022-01-01|4|NT|P|110322|E|TOYOTA|PRIUS|SILVER|HY|1800|2017-04-01
1616476935|1262912232|2022-01-01|4|NT|P|161933|BD|TOYOTA|PRIUS|SILVER|HY|1497|2005-12-30
640040599|302221893|2022-01-01|4|NT|P|47101|IG|TOYOTA|PRIUS|SILVER|HY|1797|2016-06-28

File naming doesn’t seem to be consistent between years, this time the file name has the year appended. This file is 3.4G, with 41 million lines.

2021 #

simon@NUC:~/Documents/mot_data$ unzip dft_test_result_2021.zip
Archive:  dft_test_result_2021.zip
inflating: test_result_2022/test_result_20220531131730_32355.csv  
inflating: test_result_2022/test_result_20220531131730_32357.csv  
inflating: test_result_2022/test_result_20220531131730_32360.csv  
inflating: test_result_2022/test_result_20220531131730_32361.csv  
inflating: test_result_2022/test_result_20220531131730_32365.csv  
inflating: test_result_2022/test_result_20220531131730_32367.csv  
inflating: test_result_2022/test_result_20220531131730_32370.csv  
inflating: test_result_2022/test_result_20220531131730_32372.csv  
inflating: test_result_2022/test_result_20220531131730_32375.csv  
inflating: test_result_2022/test_result_20220531131730_32378.csv  
inflating: test_result_2022/test_result_20220531131730_32384.csv  
inflating: test_result_2022/test_result_20220531131730_32386.csv  
simon@NUC:~/Documents/mot_data$ cd test_result_2022/
simon@NUC:~/Documents/mot_data/test_result_2022$ ls -lh
total 4.2G
-rw-r--r-- 1 simon simon 354M May 31  2022 test_result_20220531131730_32355.csv
-rw-r--r-- 1 simon simon 354M May 31  2022 test_result_20220531131730_32357.csv
-rw-r--r-- 1 simon simon 354M May 31  2022 test_result_20220531131730_32360.csv
-rw-r--r-- 1 simon simon 354M May 31  2022 test_result_20220531131730_32361.csv
-rw-r--r-- 1 simon simon 354M May 31  2022 test_result_20220531131730_32365.csv
-rw-r--r-- 1 simon simon 354M May 31  2022 test_result_20220531131730_32367.csv
-rw-r--r-- 1 simon simon 354M May 31  2022 test_result_20220531131730_32370.csv
-rw-r--r-- 1 simon simon 354M May 31  2022 test_result_20220531131730_32372.csv
-rw-r--r-- 1 simon simon 354M May 31  2022 test_result_20220531131730_32375.csv
-rw-r--r-- 1 simon simon 355M May 31  2022 test_result_20220531131730_32378.csv
-rw-r--r-- 1 simon simon 354M May 31  2022 test_result_20220531131730_32384.csv
-rw-r--r-- 1 simon simon 354M May 31  2022 test_result_20220531131730_32386.csv

OK, so dft_test_result_2021.zip creates a folder and multiple files labelled 2022, let’s check their contents…

simon@NUC:~/Documents/mot_data/test_result_2022$ head *
==> test_result_20220531131730_32355.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1488085241,298646303,"2021-01-01","4","NT","P","113094","PO","VOLKSWAGEN","CADDY","WHITE","DI","1598","2013-01-01"
1360139783,1372832822,"2021-01-01","4","NT","P","146500","LU","VAUXHALL","ASTRA","BLUE","DI","1686","2006-09-29"
1232194325,152373223,"2021-01-01","4","NT","F","96459","DE","VAUXHALL","MOKKA","WHITE","DI","1686","2013-04-27"
464521577,17056716,"2021-01-01","4","NT","P","201104","B","HONDA","JAZZ","BLACK","PE","1339","2005-10-31"
848357951,888720926,"2021-01-01","4","NT","P","160067","IP","PEUGEOT","407","RED","DI","1997","2007-06-29"
80685203,471452873,"2021-01-01","4","NT","P","18017","W","MERCEDES-BENZ","B-CLASS","BLACK","DI","1796","2012-12-28"
592467035,85469405,"2021-01-01","4","NT","P","129977","E","BMW","3 SERIES","BLACK","OT","2979","2013-09-26"
720412493,216763752,"2021-01-01","4","NT","F","75954","WF","FIAT","QUBO","SILVER","DI","1248","2011-02-09"
1824794287,938472162,"2021-01-01","4","NT","F","106640","NG","VAUXHALL","VIVARO","SILVER","DI","1998","2012-06-21"

==> test_result_20220531131730_32357.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1279321653,634412746,"2021-01-01","4","NT","F","343513","WF","MERCEDES-BENZ","SPRINTER","YELLOW","DI","3000","2012-04-25"
1151376195,1234673114,"2021-01-01","4","NT","P","157010","TW","AUDI","A3","SILVER","PE","1781","2001-11-23"
895485279,911643456,"2021-01-01","4","NT","P","132796","TR","FORD","FOCUS","RED","PE","1596","2009-06-11"
639594363,687919491,"2021-01-01","4","NT","P","53782","NR","LEXUS","RX","WHITE","HY","3456","2015-03-06"
767539821,889650434,"2021-01-01","4","NT","PRS","103743","TS","ISUZU","TROOPER CITATION LWB","SILVER","DI","2999","2002-09-10"
1023430737,1453366760,"2021-01-01","4","NT","P","96932","NE","FIAT","DOBLO","RED","DI","1248","2013-02-18"
383703447,51917029,"2021-01-01","7","NT","P","99190","B","MERCEDES-BENZ","SPRINTER","WHITE","DI","2143","2017-10-19"
511648905,497689026,"2021-01-01","4","NT","P","115537","BS","FORD","MONDEO","GREY","DI","1997","2010-06-11"
1871921615,177413768,"2021-01-01","4","NT","P","100202","N","NISSAN","NOTE","SILVER","PE","1386","2011-06-30"

==> test_result_20220531131730_32360.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1986430547,564890282,"2021-01-01","4","NT","PRS","117640","B","VOLKSWAGEN","TOURAN","BLACK","PE","1598","2006-03-23"
1474648715,722134102,"2021-01-01","4","NT","P","83331","ST","PEUGEOT","307","SILVER","PE","1360","2006-07-24"
195194135,1199350646,"2021-01-01","4","NT","P","50394","TQ","FORD","KA","SILVER","PE","1242","2012-04-27"
1171630471,184914876,"2021-01-01","4","RT","P","203055","HD","VOLKSWAGEN","GOLF","BLACK","DI","1968","2009-06-03"
1811357761,224020108,"2021-01-01","4","NT","P","116190","M","BMW","520","BLACK","DI","1995","2007-06-15"
787794097,115643448,"2021-01-01","4","NT","F","34237","BS","FORD","KUGA","SILVER","DI","1997","2012-07-31"
1380394059,1288867532,"2021-01-01","4","NT","P","145298","CB","FORD","FOCUS","BLACK","PE","1596","2008-10-31"
1124503143,838418611,"2021-01-01","4","NT","P","55606","IG","BMW","420","WHITE","DI","1995","2015-04-28"
484775853,241631414,"2021-01-01","4","NT","F","121005","WV","VOLKSWAGEN","GOLF","BLACK","PE","1390","2005-03-06"

==> test_result_20220531131730_32361.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1831612037,761864998,"2021-01-01","4","NT","P","88619","B","VOLKSWAGEN","GOLF","SILVER","DI","1896","2007-03-02"
599284785,116850262,"2021-01-01","4","NT","F","113055","W","PEUGEOT","207","BLACK","PE","1397","2009-12-02"
1063939289,1032666631,"2021-01-01","4","NT","P","50137","W","RENAULT","CLIO","GREY","PE","899","2015-03-24"
552157457,821557154,"2021-01-01","4","NT","P","140074","B","TOYOTA","COROLLA","RED","PE","1794","2007-09-11"
424211999,8574225,"2021-01-01","4","NT","P","103539","GL","TOYOTA","HILUX","SILVER","DI","2982","2015-05-14"
1993248297,857979320,"2021-01-01","4","NT","PRS","248808","NG","FORD","GALAXY","SILVER","DI","1753","2007-03-30"
74066427,732750802,"2021-01-01","4","RT","P","137384","B","AUDI","A6","GREY","DI","2967","2006-05-26"
377084671,373505197,"2021-01-01","4","NT","PRS","130120","PE","HONDA","CIVIC","BROWN","DI","1597","2013-11-29"
1690230053,517503710,"2021-01-01","4","NT","F","131363","ST","VAUXHALL","ASTRA","BLACK","DI","1910","2007-12-08"

==> test_result_20220531131730_32365.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1427521387,1275325852,"2021-01-01","4","NT","PRS","154859","B","TOYOTA","YARIS","BEIGE","PE","998","1999-10-06"
996557685,1104183300,"2021-01-01","4","RT","P","189669","W","VOLVO","900 Series","BLUE","PE","2316","1997-09-17"
868612227,128460869,"2021-01-01","4","RT","P","68897","CF","FORD","FIESTA","WHITE","PE","1242","2013-09-01"
276012265,1307831708,"2021-01-01","4","NT","P","78136","S","CITROEN","XSARA","SILVER","DI","1997","2001-11-01"
1508339517,722698972,"2021-01-01","4","NT","P","139570","B","FORD","GALAXY","BLACK","DI","1999","2010-03-24"
403957723,236826272,"2021-01-01","4","NT","P","2385","BS","AUSTIN","MINI MAYFAIR","GREEN","PE","1998","1990-08-24"
1252448601,69590214,"2021-01-01","4","NT","P","161781","NG","VOLVO","S40","WHITE","DI","1997","2009-09-29"
1636284975,229487022,"2021-01-01","4","NT","PRS","112789","B","FORD","S-MAX","BLUE","DI","1997","2009-09-01"
612721311,687872748,"2021-01-01","4","NT","P","107788","HA","NISSAN","NOTE","SILVER","PE","1598","2006-11-02"

==> test_result_20220531131730_32367.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1972994021,372918378,"2021-01-01","4","NT","P","192005","DE","NISSAN","NAVARA","BLUE","DI","2488","2005-09-29"
1797921235,1410950684,"2021-01-01","4","RT","P","117283","YO","NISSAN","NAVARA","BLACK","DI","2488","2009-03-20"
949430357,494602622,"2021-01-01","4","NT","F","105987","TA","FORD","MONDEO","GREY","DI","1997","2009-06-01"
1542030319,158896024,"2021-01-01","4","NT","P","88713","GU","PEUGEOT","307","BLUE","PE","1360","2007-03-28"
1158193945,405359676,"2021-01-01","4","NT","P","109011","S","KIA","SPORTAGE","SILVER","DI","1995","2012-03-31"
821484899,318101814,"2021-01-01","4","NT","PRS","87002","B","VOLKSWAGEN","POLO","SILVER","PE","1390","2002-05-20"
1959557495,225828328,"2021-01-01","4","RT","P","136529","B","NISSAN","MICRA","RED","DI","1461","2006-06-15"
262575739,587455594,"2021-01-01","4","NT","P","104812","HA","MERCEDES-BENZ","E","GREY","DI","2143","2010-09-06"
390521197,917399585,"2021-01-01","4","NT","P","56359","E","MITSUBISHI","OUTLANDER","BLACK","HY","1998","2015-06-23"

==> test_result_20220531131730_32370.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1447775663,762736456,"2021-01-01","4","RT","P","188466","TR","AUDI","A6","SILVER","DI","2698","2006-04-19"
855175701,108434218,"2021-01-01","4","NT","PRS","69061","B","RENAULT","CLIO","SILVER","PE","1390","2007-12-21"
1912430167,958268864,"2021-01-01","4","NT","PRS","119307","B","AUDI","A3","WHITE","DI","1968","2009-07-16"
1656539251,93119174,"2021-01-01","4","NT","P","91283","B","TOYOTA","COROLLA","BLACK","PE","1398","2006-10-31"
457902801,843004236,"2021-01-01","4","RT","P","235305","RG","FORD","TRANSIT","WHITE","DI","2198","2012-02-17"
713793717,1379084492,"2021-01-01","4","NT","P","134851","B","BMW","X5","GREY","DI","2993","2011-07-05"
1865302839,418842510,"2021-01-01","4","NT","P","160027","IG","VOLKSWAGEN","PASSAT","GREY","DI","1968","2009-07-15"
1818175511,168033852,"2021-01-01","7","NT","P","426591","B","MERCEDES-BENZ","SPRINTER","WHITE","DI","2148","2005-11-04"
1643102725,973895810,"2021-01-01","4","NT","F","29503","BS","RENAULT","CLIO","BLUE","PE","1149","1999-11-29"

==> test_result_20220531131730_32372.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1205321273,1204453892,"2021-01-01","4","NT","F","81916","OX","VAUXHALL","CORSA","BLACK","PE","1229","2011-01-27"
1286139403,314898326,"2021-01-01","4","NT","P","52932","NG","FORD","FOCUS","RED","PE","1596","2011-03-19"
774357571,1338803624,"2021-01-01","4","NT","P","116405","CT","VAUXHALL","CORSA","SILVER","PE","998","2010-11-26"
1494902991,14313660,"2021-01-01","4","NT","P","152177","OL","VOLKSWAGEN","GOLF","BLACK","DI","1896","2003-04-25"
181757609,479742089,"2021-01-01","4","NT","P","17385","DN","MERCEDES-BENZ","C","SILVER","PE","1991","2016-07-21"
1575721121,1395768361,"2021-01-01","4","NT","P","55299","B","MERCEDES-BENZ","CITAN","WHITE","DI","1461","2016-11-28"
935993831,443613806,"2021-01-01","4","NT","P","59680","LN","MAZDA","2","SILVER","PE","1349","2008-03-29"
1319830205,845986202,"2021-01-01","4","NT","P","177288","SA","RENAULT","TRAFIC","BLACK","DI","1870","2003-03-26"
1400648335,1454890471,"2021-01-01","4","NT","P","61757","OX","VOLKSWAGEN","TRANSPORTER","GREY","DI","1968","2016-03-02"

==> test_result_20220531131730_32375.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1353521007,1428275549,"2021-01-01","4","NT","P","25550","B","NISSAN","NOTE","SILVER","PE","1198","2016-11-14"
1272702877,808958468,"2021-01-01","4","NT","F","132516","BB","CITROEN","C4","BLACK","DI","1997","2008-04-22"
410775473,1336517888,"2021-01-01","4","NT","P","84448","TW","BMW","116","BLUE","PE","1596","2010-03-17"
1979811771,1290599648,"2021-01-01","4","RT","P","238050","BS","PORSCHE","CAYENNE","BLACK","PE","3189","2005-06-16"
1468029939,608708764,"2021-01-01","4","RT","P","135797","IG","FIAT","PUNTO","BLACK","PE","1242","2006-06-27"
316520817,1277248145,"2021-01-02","4","NT","P","57968","TN","SKODA","FABIA","RED","PE","1197","2015-07-16"
60629901,764751148,"2021-01-02","4","NT","P","79761","PE","PEUGEOT","BIPPER","RED","DI","1397","2010-02-08"
1292957153,964933515,"2021-01-02","4","NT","P","29808","LU","VOLVO","V40","BLACK","DI","1969","2016-01-29"
1629666199,141895556,"2021-01-02","4","NT","P","157456","SS","TOYOTA","RAV4","BLUE","PE","1998","1997-01-31"

==> test_result_20220531131730_32378.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
808048373,1167625608,"2021-01-01","4","NT","PRS","130084","CV","FORD","FOCUS","GREY","DI","1560","2006-01-18"
1737357381,248086777,"2021-01-01","4","NT","P","118472","B","VOLKSWAGEN","GOLF","SILVER","DI","1598","2012-10-30"
505030129,1270356948,"2021-01-01","4","NT","P","100569","WF","MERCEDES-BENZ","C","WHITE","DI","2987","2010-03-31"
585848259,112244534,"2021-01-01","4","NT","P","51209","S","SEAT","IBIZA","BLACK","PE","1390","2007-06-21"
26939099,880738429,"2021-01-01","4","NT","P","60897","BD","TOYOTA","ESTIMA","WHITE","HY","2362","2019-07-01"
875429977,1053590438,"2021-01-01","4","NT","PRS","225196","B","HONDA","CIVIC","BLUE","DI","2204","2007-01-13"
1259266351,775389341,"2021-01-01","4","NT","P","106839","PR","VAUXHALL","ASTRA","BLACK","DI","1686","2013-12-31"
1515157267,38626000,"2021-01-01","4","NT","P","135200","BB","SEAT","IBIZA","BLACK","DI","1598","2010-05-18"
363648145,599995984,"2021-01-01","4","NT","P","130800","B","SUBARU","IMPREZA","BLUE","PE","1994","2001-09-01"

==> test_result_20220531131730_32384.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1373775283,1152701668,"2021-01-02","4","NT","PRS","111961","IG","MERCEDES-BENZ","S-Class","BLACK","PE","3199","2001-07-27"
94320703,590801779,"2021-01-02","4","NT","P","22838","SE","BMW","3 SERIES","BLUE","DI","1995","2015-12-30"
1724119829,448185815,"2021-01-02","4","RT","P","157346","HU","MERCEDES-BENZ","VITO","BLUE","DI","1598","2015-12-31"
1952938719,459450004,"2021-01-02","4","NT","P","125054","ME","PEUGEOT","407","SILVER","DI","1997","2008-09-30"
1232592273,361694814,"2021-01-02","4","RT","P","130921","SS","VOLKSWAGEN","GOLF","SILVER","PE","1197","2010-09-16"
1057320513,1097794689,"2021-01-02","4","NT","P","139127","TW","FORD","TRANSIT","WHITE","DI","2198","2014-09-30"
1091210289,352834300,"2021-01-02","4","RT","P","42119","DN","FORD","FOCUS","GREEN","PE","2522","2010-09-04"
357029369,271730714,"2021-01-02","4","NT","F","88105","RH","SKODA","FABIA","GREY","DI","1598","2012-03-29"
498411353,45784180,"2021-01-02","4","NT","F","78730","DN","LAND ROVER","RANGE ROVER","GREY","DI","3630","2006-11-30"

==> test_result_20220531131730_32386.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1676793527,856620044,"2021-01-02","4","NT","P","90293","CF","AUDI","A4","BLACK","DI","1968","2008-04-22"
1037066237,325745083,"2021-01-02","4","NT","P","118982","SL","VOLKSWAGEN","GOLF","BLACK","DI","1968","2014-07-01"
525284405,877659358,"2021-01-02","4","NT","P","94618","SG","MAZDA","5","GREY","PE","1998","2009-09-15"
175138833,921349839,"2021-01-02","4","NT","P","19887","BN","PEUGEOT","208","GREY","PE","1200","2015-06-30"
1872120589,348180150,"2021-01-02","4","NT","P","135914","PE","AUDI","A3","BLACK","PE","1390","2008-07-18"
431029749,1367522568,"2021-01-02","4","NT","P","120012","UB","BMW","3 SERIES","BLUE","PE","1995","2008-12-17"
612920285,124080203,"2021-01-02","4","NT","P","79667","UB","LAND ROVER","RANGE ROVER","GOLD","DI","2993","2014-12-19"
787993071,15120336,"2021-01-02","4","NT","P","123628","SM","AUDI","A6","SILVER","DI","1968","2011-06-15"
451284025,72068234,"2021-01-02","4","NT","P","153516","SW","FORD","FIESTA","GREEN","PE","1242","2007-05-18"

The contents of the files look correct for 2021, note that the content is quoted for some columns. Will need to take this into account when importing. The total file size of these files is 4.2GB. Considerably more than 2022 & 2023 but the quoted strings would account for this.

Checking number of rows…

simon@NUC:~/Documents/mot_data/test_result_2021$ wc -l *
   3362529 test_result_2021_32355.csv
   3362450 test_result_2021_32357.csv
   3368008 test_result_2021_32360.csv
   3363319 test_result_2021_32361.csv
   3366233 test_result_2021_32365.csv
   3367302 test_result_2021_32367.csv
   3363390 test_result_2021_32370.csv
   3362481 test_result_2021_32372.csv
   3363314 test_result_2021_32375.csv
   3369258 test_result_2021_32378.csv
   3367463 test_result_2021_32384.csv
   3364911 test_result_2021_32386.csv
  40380658 total

This is a similar number to 2023/2024 which ties up with the file size increase only being related to the quoting of strings.

A quick rename will keep the files and directory consistent with the other years…

simon@NUC:~/Documents/mot_data/test_result_2022$ rename 's/20220531131730/2021/' *
simon@NUC:~/Documents/mot_data/test_result_2022$ ls -l
total 4346148
-rw-r--r-- 1 simon simon 370591916 May 31  2022 test_result_2021_32355.csv
-rw-r--r-- 1 simon simon 370584177 May 31  2022 test_result_2021_32357.csv
-rw-r--r-- 1 simon simon 371195101 May 31  2022 test_result_2021_32360.csv
-rw-r--r-- 1 simon simon 370681200 May 31  2022 test_result_2021_32361.csv
-rw-r--r-- 1 simon simon 371006342 May 31  2022 test_result_2021_32365.csv
-rw-r--r-- 1 simon simon 371116837 May 31  2022 test_result_2021_32367.csv
-rw-r--r-- 1 simon simon 370669369 May 31  2022 test_result_2021_32370.csv
-rw-r--r-- 1 simon simon 370581748 May 31  2022 test_result_2021_32372.csv
-rw-r--r-- 1 simon simon 370670092 May 31  2022 test_result_2021_32375.csv
-rw-r--r-- 1 simon simon 371339397 May 31  2022 test_result_2021_32378.csv
-rw-r--r-- 1 simon simon 371123609 May 31  2022 test_result_2021_32384.csv
-rw-r--r-- 1 simon simon 370865904 May 31  2022 test_result_2021_32386.csv
simon@NUC:~/Documents/mot_data/test_result_2022$ cd ..
simon@NUC:~/Documents/mot_data$ mv test_result_2022 test_result_2021

2020 #

simon@NUC:~/Documents/mot_data$ unzip dft_test_result_2020.zip 
Archive:  dft_test_result_2020.zip
  inflating: dft_test_result-from-2020-01-01_00-00-00-to-2020-04-01_00-00-00.csv  
  inflating: dft_test_result-from-2020-04-01_00-00-00-to-2020-07-01_00-00-00.csv  
  inflating: dft_test_result-from-2020-07-01_00-00-00-to-2020-10-01_00-00-00.csv  
  inflating: dft_test_result-from-2020-10-01_00-00-00-to-2021-01-01_00-00-00.csv  
simon@NUC:~/Documents/mot_data$ ls -lh dft_test_result-from-2020-*
-rw-r--r-- 1 simon simon 830M Mar 18  2021 dft_test_result-from-2020-01-01_00-00-00-to-2020-04-01_00-00-00.csv
-rw-r--r-- 1 simon simon 439M Mar 18  2021 dft_test_result-from-2020-04-01_00-00-00-to-2020-07-01_00-00-00.csv
-rw-r--r-- 1 simon simon 914M Mar 18  2021 dft_test_result-from-2020-07-01_00-00-00-to-2020-10-01_00-00-00.csv
-rw-r--r-- 1 simon simon 986M Mar 18  2021 dft_test_result-from-2020-10-01_00-00-00-to-2021-01-01_00-00-00.csv

OK, different file naming again, total of 3.1GB so in line with 2022/2023.

simon@NUC:~/Documents/mot_data$ wc -l dft_test_result-from-2020-*
  10104426 dft_test_result-from-2020-01-01_00-00-00-to-2020-04-01_00-00-00.csv
   5362646 dft_test_result-from-2020-04-01_00-00-00-to-2020-07-01_00-00-00.csv
  11137837 dft_test_result-from-2020-07-01_00-00-00-to-2020-10-01_00-00-00.csv
  11989108 dft_test_result-from-2020-10-01_00-00-00-to-2021-01-01_00-00-00.csv
  38594017 total
simon@NUC:~/Documents/mot_data$ head dft_test_result-from-2020-*
==> dft_test_result-from-2020-01-01_00-00-00-to-2020-04-01_00-00-00.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
666422869,1253657552,2020-01-01,4,NT,P,63975,TR,CITROEN,DISPATCH,WHITE,DI,1560,2011-03-14
623774383,51021182,2020-01-01,4,NT,P,107361,NN,SEAT,IBIZA,YELLOW,PE,1390,2008-12-18
581125897,612989654,2020-01-01,4,NT,P,73160,NN,MERCEDES,A 150,SILVER,PE,1498,2007-09-28
538477411,458058688,2020-01-01,4,NT,P,,TR,CITROEN,DISPATCH,WHITE,DI,1868,2004-11-19
325234981,1422080365,2020-01-01,1,NT,F,27120,SS,KTM,125,ORANGE,PE,125,2013-12-07
367883467,1254023710,2020-01-01,4,NT,P,81260,RM,FORD,FOCUS,BLUE,PE,1596,2005-06-13
453180439,1266564042,2020-01-01,4,NT,P,93426,CA,SKODA,FABIA,GREY,DI,1598,2012-06-21
282586495,341436608,2020-01-01,4,NT,P,127237,B,VAUXHALL,INSIGNIA,WHITE,DI,1956,2010-09-10
154641037,504524381,2020-01-01,4,NT,P,109759,B,SKODA,RAPID,RED,DI,1598,2014-12-19

==> dft_test_result-from-2020-04-01_00-00-00-to-2020-07-01_00-00-00.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
677835507,1044704117,2020-04-01,4,RT,P,50331,M,PEUGEOT,EXPERT,RED,DI,1560,2015-10-31
763132479,1217941099,2020-04-01,7,NT,P,156078,WA,MERCEDES-BENZ,SPRINTER,WHITE,DI,2143,2014-03-31
635187021,503571165,2020-04-01,7,NT,P,104440,BD,MERCEDES-BENZ,SPRINTER,WHITE,DI,2143,2016-04-20
592538535,1399375571,2020-04-01,4,NT,P,34837,IP,VOLKSWAGEN,POLO,WHITE,PE,999,2016-04-30
549890049,1168611002,2020-04-01,4,NT,P,76149,CV,RENAULT,MEGANE,WHITE,DI,1461,2011-09-14
464593077,221689772,2020-04-01,4,NT,P,68683,S,FORD,FIESTA,WHITE,PE,1242,2012-03-20
421944591,384337924,2020-04-01,4,NT,P,120364,CV,NISSAN,NAVARA,SILVER,DI,2488,2005-11-30
507241563,1070427905,2020-04-01,4,NT,P,123609,WF,MERCEDES-BENZ,SPRINTER,YELLOW,DI,2987,2015-06-09
379296105,1218623039,2020-04-01,7,RT,P,58995,HU,MERCEDES-BENZ,SPRINTER,SILVER,DI,2143,2015-04-20

==> dft_test_result-from-2020-07-01_00-00-00-to-2020-10-01_00-00-00.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
534152707,749768485,2020-07-01,4,NT,P,44038,DN,RENAULT,KADJAR,BLACK,DI,1598,2015-08-28
619449679,1316673618,2020-07-01,4,NT,P,46052,OL,MINI,MINI (R58),RED,PE,1598,2012-08-10
662098165,1087081109,2020-07-01,4,NT,P,25513,NE,MERCEDES-BENZ,GLA,GREY,DI,2143,2017-02-16
235613305,1150300382,2020-07-01,4,RT,P,56624,B,VAUXHALL,COMBO,RED,DI,1248,2011-06-15
576801193,675795911,2020-07-01,4,NT,P,12352,LU,SKODA,CITIGO,WHITE,PE,999,2016-10-31
320910277,1383430983,2020-07-01,4,NT,F,67938,DN,AUDI,A4,BLACK,DI,1968,2016-11-16
448855735,187152929,2020-07-01,4,NT,P,86544,LU,MITSUBISHI,OUTLANDER,BLACK,HY,1998,2015-11-02
491504221,378989977,2020-07-01,7,NT,F,48171,HU,MERCEDES-BENZ,VITO,WHITE,DI,2143,2013-09-26
363558763,1451451149,2020-07-01,4,NT,P,20213,LE,FORD,MUSTANG,BLUE,PE,4951,2017-08-26

==> dft_test_result-from-2020-10-01_00-00-00-to-2021-01-01_00-00-00.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
840415303,1367588451,2020-10-01,4,NT,P,36788,LE,FORD,TRANSIT,SILVER,DI,2198,2014-10-03
797766817,146223609,2020-10-01,4,NT,P,21856,LU,NISSAN,JUKE,BLUE,PE,1618,2017-02-03
968360761,144911119,2020-10-01,4,NT,P,45041,LE,MINI,PACEMAN,BLACK,PE,1598,2015-03-26
755118331,740243933,2020-10-01,4,NT,P,30389,LU,FORD,FOCUS,BLACK,PE,1999,2017-11-22
1224251677,1368373019,2020-10-01,4,NT,P,38749,LU,LAND ROVER,RANGE ROVER EVOQUE,BLACK,DI,1999,2016-09-27
243336499,359109671,2020-10-01,4,RT,P,45397,LU,PEUGEOT,2008,WHITE,DI,1398,2014-06-30
158039527,163738397,2020-10-01,4,RT,P,93462,LU,MERCEDES-BENZ,C,GREY,DI,2143,2015-05-05
456578929,703774257,2020-10-01,4,NT,P,33700,HD,AUDI,A6,BLACK,DI,1968,2017-06-30
584524387,1498372684,2020-10-01,4,NT,PRS,173053,E,VAUXHALL,CORSAVAN,BLUE,DI,1686,2001-12-14

Total of 38.5 million tests and heads of each file look good but again a different format with comma as the delimiter.

2019 #

simon@NUC:~/Documents/mot_data$ unzip dft_test_result_2019.zip 
Archive:  dft_test_result_2019.zip
  inflating: dft_test_result-from-2019-04-01_00-00-01-to-2019-07-01_00-00-01.csv  
   creating: __MACOSX/
  inflating: __MACOSX/._dft_test_result-from-2019-04-01_00-00-01-to-2019-07-01_00-00-01.csv  
  inflating: dft_test_result-from-2019-10-01_00-00-01-to-2020-01-01_00-00-01.csv  
  inflating: __MACOSX/._dft_test_result-from-2019-10-01_00-00-01-to-2020-01-01_00-00-01.csv  
  inflating: dft_test_result-from-2019-07-01_00-00-01-to-2019-10-01_00-00-01.csv  
  inflating: __MACOSX/._dft_test_result-from-2019-07-01_00-00-01-to-2019-10-01_00-00-01.csv  
  inflating: dft_test_result-from-2019-01-01_00-00-01-to-2019-04-01_00-00-01.csv  
  inflating: __MACOSX/._dft_test_result-from-2019-01-01_00-00-01-to-2019-04-01_00-00-01.csv  

Obviously compiled on a Mac this year. Let’s remove the metadata folder

simon@NUC:~/Documents/mot_data$ rm -rf __MACOSX/

And initial data checks…

simon@NUC:~/Documents/mot_data$ ls -lh dft_test_result-from-2019-*
-rw-r--r-- 1 simon simon 842M May 11  2020 dft_test_result-from-2019-01-01_00-00-01-to-2019-04-01_00-00-01.csv
-rw-r--r-- 1 simon simon 863M May 11  2020 dft_test_result-from-2019-04-01_00-00-01-to-2019-07-01_00-00-01.csv
-rw-r--r-- 1 simon simon 840M May 11  2020 dft_test_result-from-2019-07-01_00-00-01-to-2019-10-01_00-00-01.csv
-rw-r--r-- 1 simon simon 696M May 11  2020 dft_test_result-from-2019-10-01_00-00-01-to-2020-01-01_00-00-01.csv
simon@NUC:~/Documents/mot_data$ wc -l dft_test_result-from-2019-*
  10210087 dft_test_result-from-2019-01-01_00-00-01-to-2019-04-01_00-00-01.csv
  10466892 dft_test_result-from-2019-04-01_00-00-01-to-2019-07-01_00-00-01.csv
  10194305 dft_test_result-from-2019-07-01_00-00-01-to-2019-10-01_00-00-01.csv
   8439418 dft_test_result-from-2019-10-01_00-00-01-to-2020-01-01_00-00-01.csv
  39310702 total
simon@NUC:~/Documents/mot_data$ head dft_test_result-from-2019-*
==> dft_test_result-from-2019-01-01_00-00-01-to-2019-04-01_00-00-01.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
1930167913,1168220651,2019-01-01,4,NT,P,47108,LL,LAND ROVER,DISCOVERY,WHITE,DI,2993,2014-07-29
1887519427,608494756,2019-01-01,4,NT,P,74254,RM,VAUXHALL,COMBO,BLUE,DI,1686,2000-10-16
1844870941,345838224,2019-01-01,4,NT,P,52596,RM,SMART (MCC),FORTWO COUPE,BLACK,PE,999,2010-06-30
1802222455,712515370,2019-01-01,4,NT,F,97925,S,KIA,CEED,BLUE,DI,1582,2007-10-31
1631628511,929718858,2019-01-01,4,RT,P,91055,BB,TOYOTA,YARIS,RED,PE,998,2002-11-11
1588980025,228077478,2019-01-01,4,RT,P,69520,BN,HYUNDAI,COUPE,SILVER,PE,1975,2006-06-27
1205143651,614637102,2019-01-01,4,RT,P,62554,CA,NISSAN,JUKE,RED,PE,1598,2011-03-09
1759573969,618829162,2019-01-01,4,NT,P,56880,TW,PEUGEOT,308 S AUTO,GREY,PE,1598,2008-11-19
1674276997,682893232,2019-01-01,4,NT,F,80949,S,ALFA ROMEO,MITO,BLACK,PE,1368,2009-05-25

==> dft_test_result-from-2019-04-01_00-00-01-to-2019-07-01_00-00-01.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
949308439,831850247,2019-04-01,4,NT,P,11495,LU,FORD,FIESTA,WHITE,PE,998,2016-06-06
864011467,1337370923,2019-04-01,7,NT,P,144959,S,MERCEDES-BENZ,SPRINTER,SILVER,DI,2143,2015-06-01
906659953,134104785,2019-04-01,4,NT,F,13234,LU,RENAULT,CAPTUR,CREAM,DI,1461,2014-05-09
821362981,697598213,2019-04-01,4,NT,P,47127,S,KIA,RIO,BLUE,DI,1396,2013-03-04
778714495,1033590608,2019-04-01,4,NT,P,46895,LU,VAUXHALL,CORSA,GREY,PE,1229,2012-03-12
736066009,928095205,2019-04-01,4,NT,P,58118,DA,HONDA,CIVIC,BLUE,DI,1597,2016-04-13
693417523,15334170,2019-04-01,4,NT,F,57765,DA,HYUNDAI,AMICA,SILVER,PE,1086,2007-04-30
650769037,1285097458,2019-04-01,4,NT,P,114576,SO,TOYOTA,COROLLA,SILVER,PE,1598,2002-04-12
53690233,211955150,2019-04-01,4,NT,ABR,"",DN,FORD,KA,BLUE,PE,1299,2003-03-31

==> dft_test_result-from-2019-07-01_00-00-01-to-2019-10-01_00-00-01.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
82833359,970855005,2019-07-01,4,NT,P,10702,LU,KIA,PICANTO,YELLOW,PE,998,2015-09-22
1997536387,711499399,2019-07-01,4,NT,P,63103,LU,VAUXHALL,INSIGNIA,BLACK,DI,1956,2014-10-24
1954887901,711499399,2019-07-01,4,NT,ABR,"",LU,VAUXHALL,INSIGNIA,BLACK,DI,1956,2014-10-24
1912239415,384548675,2019-07-01,4,RT,P,18253,LU,DS,DS3,WHITE,DI,1560,2016-07-09
1869590929,96388074,2019-07-01,4,NT,P,77656,CV,CITROEN,XSARA,SILVER,PE,1587,2003-06-11
1826942443,595424953,2019-07-01,4,NT,P,60312,CV,NISSAN,JUKE,WHITE,DI,1461,2012-11-01
1741645471,1368159163,2019-07-01,7,NT,P,102797,BS,RENAULT,MASTER,BLACK,DI,2299,2014-04-25
1698996985,997585230,2019-07-01,4,NT,P,59156,LU,NISSAN,QASHQAI,BLACK,PE,1598,2011-07-06
1656348499,1431210186,2019-07-01,4,NT,P,195441,SS,FORD,MONDEO,SILVER,DI,1997,2009-06-02

==> dft_test_result-from-2019-10-01_00-00-01-to-2020-01-01_00-00-01.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
1382999721,1185783101,2019-10-01,4,NT,P,30624,G,AUDI,A6,WHITE,DI,1968,2016-09-30
1212405777,1032453651,2019-10-01,4,NT,P,37795,ME,BMW,520,GREY,DI,1995,2015-04-29
1297702749,1339540147,2019-10-01,7,NT,P,35083,LE,FORD,RANGER,SILVER,DI,3198,2016-09-27
1127108805,300811285,2019-10-01,4,NT,P,36582,NG,BMW,420,WHITE,DI,1995,2017-02-02
1255054263,1493297193,2019-10-01,7,NT,P,15558,SW,FORD,TRANSIT,WHITE,DI,1995,2016-11-30
1169757291,147054239,2019-10-01,4,NT,PRS,29499,OL,VAUXHALL,ASTRA,BLACK,PE,999,2016-09-01
1084460319,454568331,2019-10-01,4,NT,P,37519,ME,BMW,118,GREY,DI,1995,2017-01-20
1041811833,270177294,2019-10-01,4,NT,P,163104,SK,LAND ROVER,FREELANDER,SILVER,DI,1951,2003-11-27
828569403,464178463,2019-10-01,4,NT,F,52844,ME,CITROEN,C4,WHITE,DI,1560,2013-12-03

Around 3.1GB of files again and 39 million rows and the heads of files all look good and again comma as delimter.

2018 #

simon@NUC:~/Documents/mot_data$ unzip dft_test_result_2018.zip 
Archive:  dft_test_result_2018.zip
  inflating: dft_test_result-from-2018-01-01_00-00-01-to-2018-04-01_00-00-01.csv  
   creating: __MACOSX/
  inflating: __MACOSX/._dft_test_result-from-2018-01-01_00-00-01-to-2018-04-01_00-00-01.csv  
  inflating: dft_test_result-from-2018-10-01_00-00-01-to-2019-01-01_00-00-01.csv  
  inflating: __MACOSX/._dft_test_result-from-2018-10-01_00-00-01-to-2019-01-01_00-00-01.csv  
  inflating: dft_test_result-from-2018-07-01_00-00-01-to-2018-10-01_00-00-01.csv  
  inflating: __MACOSX/._dft_test_result-from-2018-07-01_00-00-01-to-2018-10-01_00-00-01.csv  
  inflating: dft_test_result-from-2018-04-01_00-00-01-to-2018-07-01_00-00-01.csv  
  inflating: __MACOSX/._dft_test_result-from-2018-04-01_00-00-01-to-2018-07-01_00-00-01.csv  
simon@NUC:~/Documents/mot_data$ rm -rf __MACOSX/*
simon@NUC:~/Documents/mot_data$ ls -lh dft_test_result-from-2018-*
-rw-r--r-- 1 simon simon 821M May 11  2020 dft_test_result-from-2018-01-01_00-00-01-to-2018-04-01_00-00-01.csv
-rw-r--r-- 1 simon simon 863M May 11  2020 dft_test_result-from-2018-04-01_00-00-01-to-2018-07-01_00-00-01.csv
-rw-r--r-- 1 simon simon 818M May 11  2020 dft_test_result-from-2018-07-01_00-00-01-to-2018-10-01_00-00-01.csv
-rw-r--r-- 1 simon simon 690M May 11  2020 dft_test_result-from-2018-10-01_00-00-01-to-2019-01-01_00-00-01.csv
simon@NUC:~/Documents/mot_data$ wc -l dft_test_result-from-2018-*
   9950816 dft_test_result-from-2018-01-01_00-00-01-to-2018-04-01_00-00-01.csv
  10457352 dft_test_result-from-2018-04-01_00-00-01-to-2018-07-01_00-00-01.csv
   9916657 dft_test_result-from-2018-07-01_00-00-01-to-2018-10-01_00-00-01.csv
   8356980 dft_test_result-from-2018-10-01_00-00-01-to-2019-01-01_00-00-01.csv
  38681805 total
simon@NUC:~/Documents/mot_data$ head dft_test_result-from-2018-*
==> dft_test_result-from-2018-01-01_00-00-01-to-2018-04-01_00-00-01.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
65820879,600511556,2018-01-01,4,NT,P,224076,RH,VAUXHALL,ZAFIRA,GREY,DI,1686,2012-05-31
782304899,388943826,2018-01-01,4,RT,P,111810,WR,VOLKSWAGEN,POLO,SILVER,PE,1198,2005-07-29
824953385,658134358,2018-01-01,4,NT,P,94665,DE,LAND ROVER,UNCLASSIFIED,PINK,DI,3528,1987-02-12
739656413,1034211704,2018-01-01,4,NT,P,66741,LU,HYUNDAI,I20,BLACK,PE,1396,2010-06-30
227874581,1068455764,2018-01-01,4,RT,P,164211,SN,VOLVO,V40,BLUE,DI,1870,2002-09-30
697007927,789941650,2018-01-01,4,NT,P,126213,B,HONDA,CR-V,RED,PE,1998,2002-10-04
185226095,864947656,2018-01-01,4,RT,P,103961,GL,PEUGEOT,206,GREEN,PE,1360,2001-01-16
398468525,768388704,2018-01-01,4,NT,P,62835,LU,HONDA,CIVIC,SILVER,EL,1339,2006-01-03
99929123,1410889022,2018-01-01,4,NT,PRS,128327,SS,AUDI,A4,BLUE,DI,1986,2005-09-28

==> dft_test_result-from-2018-04-01_00-00-01-to-2018-07-01_00-00-01.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
1527211683,1093474621,2018-04-01,4,NT,P,77010,LU,VAUXHALL,ASTRA,BLACK,DI,1248,2013-11-29
1441914711,1186082240,2018-04-01,4,NT,P,45033,LU,BMW,3 SERIES,WHITE,DI,1995,2012-05-09
1356617739,161669651,2018-04-01,4,NT,P,33131,LU,FORD,FOCUS,WHITE,DI,1560,2014-03-06
1399266225,588925693,2018-04-01,4,NT,P,19970,LU,SEAT,IBIZA,BLACK,PE,1197,2015-06-02
1313969253,576684283,2018-04-01,4,NT,P,18202,LU,VAUXHALL,CORSA,SILVER,DI,1248,2013-06-10
1271320767,258562511,2018-04-01,4,NT,P,67148,LU,VOLKSWAGEN,PASSAT,SILVER,DI,1968,2014-11-14
1228672281,742682739,2018-04-01,4,NT,P,22557,LU,PEUGEOT,108,BLUE,PE,998,2015-03-25
1143375309,582289043,2018-04-01,4,NT,P,40669,LU,MERCEDES-BENZ,C,BLACK,DI,2143,2014-03-27
1186023795,1142811899,2018-04-01,4,NT,F,46206,LU,BMW,520,BLACK,DI,1995,2013-08-21

==> dft_test_result-from-2018-07-01_00-00-01-to-2018-10-01_00-00-01.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
1704302935,692053805,2018-07-01,4,NT,P,91125,LU,MERCEDES-BENZ,A,BLUE,DI,2143,2014-07-17
1661654449,48596130,2018-07-01,4,NT,P,42598,EN,FORD,FIESTA,BLACK,PE,1388,2004-07-07
847004567,542554216,2018-07-01,4,NT,P,70144,SR,VOLKSWAGEN,GOLF,GREY,DI,1968,2010-11-29
1491060505,24796546,2018-07-01,4,NT,P,89108,EN,VAUXHALL,CORSA,SILVER,PE,1389,2005-09-30
1576357477,430844402,2018-07-01,4,NT,F,66139,LU,MINI,MINI (R60),WHITE,PE,1598,2012-09-01
1448412019,1329137730,2018-07-01,4,NT,P,24040,G,PEUGEOT,PARTNER,RED,DI,1560,2012-09-20
1533708991,352542429,2018-07-01,4,NT,P,90072,LU,BMW,316,GREY,DI,1995,2014-02-04
1363115047,227397037,2018-07-01,4,NT,P,89884,LU,MERCEDES-BENZ,A,GREY,DI,2143,2015-07-31
1320466561,61351339,2018-07-01,4,RT,P,59238,LU,VOLVO,V40,GREY,DI,1560,2014-07-31

==> dft_test_result-from-2018-10-01_00-00-01-to-2019-01-01_00-00-01.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
21790331,852883190,2018-10-01,4,NT,P,73487,EH,CITROEN,RELAY,WHITE,DI,2198,2008-05-30
1979141845,618264835,2018-10-01,4,NT,P,39881,LU,JAGUAR,XF,WHITE,DI,2993,2014-09-30
1936493359,466633390,2018-10-01,4,NT,P,49708,EH,PEUGEOT,BIPPER,RED,DI,1397,2010-03-25
1893844873,381209476,2018-10-01,4,NT,P,107483,CV,MITSUBISHI,ASX,BLACK,DI,1798,2010-11-11
1851196387,70846877,2018-10-01,4,NT,F,42385,LU,CITROEN,C3,BLACK,DI,1560,2016-03-14
1765899415,407497924,2018-10-01,4,NT,PRS,191984,CV,LAND ROVER,DISCOVERY,BLUE,DI,2495,1994-12-31
1595305471,1228557574,2018-10-01,4,NT,PRS,47499,RG,CITROEN,BERLINGO,SILVER,DI,1560,2009-06-08
642757577,611683723,2018-10-01,4,NT,F,19483,AB,FORD,FOCUS,SILVER,PE,998,2014-10-31
912929695,399212205,2018-10-01,4,RT,P,27136,LU,CITROEN,C4,GREY,DI,1560,2016-09-30

3.1GB of files with total rows of 38.6 million and the heads of files look valid with comma as the delimiter again.

2017 #

simon@NUC:~/Documents/mot_data$ unzip dft_test_result_2017.zip 
Archive:  dft_test_result_2017.zip
  inflating: test_result_31870.csv   
  inflating: test_result_31871.csv   
  inflating: test_result_31876.csv   
  inflating: test_result_31879.csv   
  inflating: test_result_31859.csv   
  inflating: test_result_31860.csv   
  inflating: test_result_31861.csv   
  inflating: test_result_31862.csv   
  inflating: test_result_31863.csv   
  inflating: test_result_31864.csv   
  inflating: test_result_31868.csv   
  inflating: test_result_31869.csv   

Another naming format for 2017. A quick rename to keep it consistent so we know which files relate to which year in case of any issues with importing…

simon@NUC:~/Documents/mot_data$ rename 's/t_3/t_2017_/' *
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2017_18*
-rw-rw-r-- 1 simon simon 266M Jul  4  2018 test_result_2017_1859.csv
-rw-rw-r-- 1 simon simon 265M Jul  4  2018 test_result_2017_1860.csv
-rw-rw-r-- 1 simon simon 266M Jul  4  2018 test_result_2017_1861.csv
-rw-rw-r-- 1 simon simon 266M Jul  4  2018 test_result_2017_1862.csv
-rw-rw-r-- 1 simon simon 266M Jul  4  2018 test_result_2017_1863.csv
-rw-rw-r-- 1 simon simon 265M Jul  4  2018 test_result_2017_1864.csv
-rw-rw-r-- 1 simon simon 265M Jul  4  2018 test_result_2017_1868.csv
-rw-rw-r-- 1 simon simon 265M Jul  4  2018 test_result_2017_1869.csv
-rw-rw-r-- 1 simon simon 265M Jul  4  2018 test_result_2017_1870.csv
-rw-rw-r-- 1 simon simon 266M Jul  4  2018 test_result_2017_1871.csv
-rw-rw-r-- 1 simon simon 265M Jul  4  2018 test_result_2017_1876.csv
-rw-rw-r-- 1 simon simon 265M Jul  4  2018 test_result_2017_1879.csv
simon@NUC:~/Documents/mot_data$ wc -l test_result_2017_18*
   3174847 test_result_2017_1859.csv
   3171747 test_result_2017_1860.csv
   3175419 test_result_2017_1861.csv
   3174045 test_result_2017_1862.csv
   3172553 test_result_2017_1863.csv
   3169572 test_result_2017_1864.csv
   3169093 test_result_2017_1868.csv
   3169588 test_result_2017_1869.csv
   3167012 test_result_2017_1870.csv
   3174846 test_result_2017_1871.csv
   3168767 test_result_2017_1876.csv
   3168684 test_result_2017_1879.csv
  38056173 total
simon@NUC:~/Documents/mot_data$ head test_result_2017_18*
==> test_result_2017_1859.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
526927819|465437000|2017-01-02|4|RT|P|112952|GU|MAZDA|6|GREY|PE|2261|2005-01-06
1247473239|885751324|2017-01-02|4|NT|P|57263|ME|FORD|FUSION|BLACK|PE|1596|2008-02-27
351855033|1453814350|2017-01-02|4|NT|P|32882|CR|FORD|FIESTA|BLACK|PE|1388|2011-02-17
95964117|1436475652|2017-01-02|4|NT|P|195000|PR|BMW|320|BLACK|DI|1995|2000-12-01
1328291369|442557202|2017-01-02|4|NT|P|54343|CA|VOLKSWAGEN|BEETLE|BEIGE|PE|1596|2007-10-10
1072400453|502458856|2017-01-02|4|NT|PRS|62633|M|NISSAN|ALMERA|BLUE|PE|1497|2003-01-21
304727705|899608938|2017-01-02|4|NT|P|113202|B|JAGUAR|XJ|GREY|DI|2722|2006-04-25
1409109499|1202362466|2017-01-02|4|NT|F||BS|AUDI|A4|RED|DI|1968|2010-07-29
1281164041|482959208|2017-01-02|4|NT|F|37506|DE|SKODA|FABIA|BLUE|DI|1422|2008-01-29

==> test_result_2017_1860.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
271036903|568137328|2017-01-02|4|NT|P|123645|M|VOLKSWAGEN|POLO|GREY|PE|1390|2004-12-28
1631309613|120764444|2017-01-02|4|RT|P|91889|LA|TOYOTA|YARIS|GREEN|PE|998|2003-10-30
991582323|1455897531|2017-01-02|4|NT|P|44572|CV|FORD|FOCUS|SILVER|PE|998|2013-12-31
735691407|306106474|2017-01-02|4|NT|P|71490|DH|SEAT|LEON|SILVER|PE|1598|2003-05-01
1456236827|934947086|2017-01-02|4|NT|PRS|178845|B|MINI|MINI|GREEN|PE|1598|2001-10-05
560618621|319832178|2017-01-02|4|NT|PRS|96824|B|TOYOTA|AVENSIS|SILVER|DI|1995|2006-12-08
1792945873|198845134|2017-01-02|4|NT|F|100308|RG|FORD|FIESTA|GREEN|PE|1242|1999-08-26
1537054957|509803738|2017-01-02|4|NT|P|174115|N|VAUXHALL|ZAFIRA|BLUE|DI|1910|2007-02-09
769382209|421022658|2017-01-02|4|NT|ABA||OL|NISSAN|ALMERA|BLUE|PE|1497|2000-11-24

==> test_result_2017_1861.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
1664502375|1200185902|2017-01-06|4|NT|ABR|142293|BS|FORD|FIESTA|GREEN|PE|1299|2001-02-05
143974661|8412794|2017-01-03|4|NT|P|99818|LS|MERCEDES-BENZ|SPRINTER|BLACK|DI|2148|2008-04-16
1158918247|330083118|2017-01-04|4|NT|P||DT|VOLKSWAGEN|GOLF|SILVER|DI|1896|2002-10-29
119161931|1090613800|2017-01-06|4|NT|P|61056|NP|VAUXHALL|ZAFIRA|SILVER|DI|1686|2010-07-22
1260816059|470262042|2017-01-03|7|RT|P||LA|FORD|TRANSIT|WHITE|DI|1998|2004-04-06
950529483|690396331|2017-01-03|4|NT|P|17239|S|VOLKSWAGEN|UP|RED|PE|999|2013-12-11
1941596189|153542720|2017-01-03|4|NT|P|22657|CF|VAUXHALL|INSIGNIA|SILVER|PE|1796|2012-04-30
636064453|466030066|2017-01-02|4|NT|P|172237|LS|FORD|TRANSIT CONNECT|WHITE|DI|1753|2006-12-29
1980460315|164601067|2017-01-03|4|NT|PRS|30506|CO|NISSAN|JUKE|BLACK|DI|1461|2013-11-26

==> test_result_2017_1862.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
8328237|681645488|2017-01-02|4|NT|F|65814|M|CITROEN|DS3|RED|DI|1560|2011-01-26
1752437321|765728427|2017-01-02|4|NT|P|27679|L|VAUXHALL|CORSA|SILVER|PE|1398|2013-12-31
1624491863|1420528436|2017-01-02|4|NT|P|90797|N|FORD|FIESTA|BLACK|PE|1242|2006-03-31
1240655489|635421232|2017-01-02|4|NT|P|94170|NR|MITSUBISHI|L200|MAROON|DI|2477|2002-01-01
984764573|1187401766|2017-01-02|4|NT|P|30997|PO|TOYOTA|YARIS|BLACK|PE|1296|2007-10-31
217091825|151702496|2017-01-02|4|NT|P|31265|S|FORD|FIESTA|BLACK|PE|1388|2008-12-15
89146367|544640068|2017-01-02|4|NT|F|258702|BB|PEUGEOT|307|BLACK|DI|1997|2004-11-25
1833255451|204271538|2017-01-02|4|NT|P|100228|E|SEAT|AROSA|BLACK|PE|998|2002-04-13
1449419077|1461032462|2017-01-02|4|RT|P|35860|HA|ALFA ROMEO|MITO|GREY|PE|1368|2009-12-31

==> test_result_2017_1863.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
129853893|1373806199|2017-01-02|4|NT|F|11558|DY|PEUGEOT|3008|BLUE|DI|1560|2013-12-31
1873962977|1409972139|2017-01-02|4|NT|F|34223|NN|CITROEN|DISPATCH|BLACK|DI|1560|2012-12-12
1490126603|1110481264|2017-01-02|4|NT|PRS|50541|BS|MAZDA|2|BLACK|PE|1349|2009-12-21
1234235687|1152440150|2017-01-02|4|NT|PRS|23662|DN|FORD|FOCUS|SILVER|PE|1596|2012-05-29
978344771|1148953984|2017-01-02|4|NT|PRS|71688|B|TOYOTA|AYGO|RED|PE|998|2007-12-21
722453855|681423032|2017-01-02|4|RT|P|147865|E|VAUXHALL|ZAFIRA|BLACK|DI|1995|2003-09-26
466562939|133078570|2017-01-02|4|NT|PRS|92445|SE|FORD|FIESTA|BLUE|PE|1242|2004-03-31
210672023|3680216|2017-01-02|7|NT|P|188205|CR|MERCEDES-BENZ|SPRINTER|WHITE|DI|2148|2008-01-02
1954781107|1049276364|2017-01-02|4|NT|ABR||UB|TOYOTA|PRIUS|BLACK|EL|1497|2007-11-12

==> test_result_2017_1864.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
1294600567|1453814350|2017-01-02|4|NT|ABR||CR|FORD|FIESTA|BLACK|PE|1388|2011-02-17
1166655109|1139991754|2017-01-02|4|NT|P|53536|DY|BMW|118|GREY|DI|1995|2011-07-08
654873277|1232925478|2017-01-02|4|NT|PRS|35937|AL|FIAT|500|BLACK|PE|1242|2012-04-26
15145987|103343568|2017-01-02|4|NT|PRS|157111|IP|FORD|FUSION|BLUE|DI|1399|2005-05-07
1375418697|112547112|2017-01-02|4|NT|P|141892|RM|HONDA|CR-V|BLACK|DI|2204|2008-01-03
223909575|257750630|2017-01-02|4|NT|F|50937|MK|RENAULT|CLIO|SILVER|PE|1149|2008-01-10
1712127743|669189918|2017-01-02|4|NT|F|48530|WF|VOLKSWAGEN|GOLF|RED|DI|1968|2010-11-26
944454995|659540514|2017-01-02|4|NT|P|90850|B|TOYOTA|AURIS|WHITE|PE|1797|2012-01-03
1920891331|1394897952|2017-01-02|4|NT|P|61129|RG|JAGUAR|X TYPE|BLUE|DI|1988|2004-01-09

==> test_result_2017_1868.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
15543935|228810164|2017-01-02|4|NT|PRS|66744|HA|HONDA|STREAM|BLUE|PE|1998|2004-06-22
1759653019|934785018|2017-01-02|4|NT|P|82670|IG|FORD|FIESTA|RED|PE|1388|2009-11-27
1503762103|290561228|2017-01-02|4|NT|P|40320|TW|FORD|KA|BLUE|PE|1299|2006-11-01
480198439|1258278844|2017-01-02|2|NT|P|24372|BB|YAMAHA|TTR 600 RE|BLUE|PE|595|2005-03-02
1072798401|1259526146|2017-01-02|4|NT|F|174986|BD|HONDA|CIVIC|SILVER|PE|1590|2001-09-27
1537452905|446317648|2017-01-02|4|RT|P|38642|B|MERCEDES|CLC 180|BLUE|PE|1796|2008-11-26
385943783|724431334|2017-01-02|4|RT|P|78845|RH|RENAULT|MEGANE|RED|DI|1870|2008-01-24
1234434661|1163462280|2017-01-02|4|RT|P|110076|BB|VAUXHALL|CORSA|SILVER|PE|1199|2004-06-29
1443198249|550412542|2017-01-02|4|RT|P|86386|FK|VAUXHALL|COMBO|WHITE|DI|1248|2008-06-30

==> test_result_2017_1869.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
1436181525|1379300556|2017-01-02|4|NT|P|113743|M|MERCEDES-BENZ|C|SILVER|DI|2148|2008-12-22
796454235|488253688|2017-01-02|4|NT|P|95121|MK|SEAT|LEON|RED|PE|1984|2007-10-29
412617861|799495426|2017-01-02|4|NT|P|61172|NR|VAUXHALL|CORSA|BLUE|DI|1248|2009-01-19
156726945|1449047278|2017-01-02|4|NT|P|343791|DE|AUDI|A6|GREY|DI|1871|2002-09-27
1516999655|165452987|2017-01-02|4|NT|P|39811|SA|TOYOTA|AYGO|ORANGE|PE|998|2012-12-27
1133163281|558090558|2017-01-02|4|RT|P|132273|WA|VAUXHALL|VECTRA|SILVER|PE|2198|2005-12-22
237545075|974869918|2017-01-02|4|NT|P|91510|NN|BMW|118|GREY|DI|1995|2009-09-01
1294799541|709044580|2017-01-02|4|NT|PRS|67082|DG|BMW|318I SE|BLUE|PE|1995|2008-12-12
96163091|205539776|2017-01-02|4|NT|P|131730|CO|VAUXHALL|ZAFIRA|RED|DI|1995|2000-10-29

==> test_result_2017_1870.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
561016569|1001447262|2017-01-02|4|NT|F|66314|GL|VAUXHALL|CORSA|SILVER|PE|1229|2009-04-08
305125653|1303343730|2017-01-02|4|NT|P|16665|CR|NISSAN|CUBE|WHITE|PE|1598|2010-10-01
1921289279|1064420574|2017-01-02|4|RT|P|82895|YO|VAUXHALL|ZAFIRA|BLACK|PE|1598|2007-11-06
1281561989|516766956|2017-01-02|4|NT|P|88297|LE|AUDI|A5|WHITE|DI|1968|2009-12-07
130052867|132755868|2017-01-02|4|RT|P|83877|BS|HYUNDAI|I30|GREY|DI|1582|2009-05-20
2107409|160305148|2017-01-02|4|NT|P|64047|B|FORD|S-MAX|WHITE|DI|1997|2012-03-02
210870997|718533400|2017-01-02|4|NT|P|75327|UB|TOYOTA|MPV|WHITE|PE|2400|2003-12-31
547580043|526666766|2017-01-02|4|NT|P|152928|BB|VOLKSWAGEN|GOLF|SILVER|DI|1896|2002-05-16
1012234547|664721978|2017-01-02|4|NT|P|116496|CT|LEXUS|RX300|SILVER|PE|2995|2003-05-12

==> test_result_2017_1871.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
736089355|1129907754|2017-01-02|4|NT|P|43543|SL|TOYOTA|AURIS|GREY|PE|1598|2008-03-04
224307523|1199330268|2017-01-02|4|NT|PRS|152317|E|MERCEDES|C 200|SILVER|DI|2148|2009-09-01
1840471149|672782440|2017-01-02|4|RT|P|83395|CW|RENAULT|CLIO|BLUE|PE|1149|2003-11-28
1200743859|20307830|2017-01-02|4|NT|P|66715|BB|VAUXHALL|ASTRA|GOLD|PE|1364|2004-09-20
433071111|622653478|2017-01-02|4|RT|P|61814|TF|HYUNDAI|COUPE|SILVER|PE|2656|2004-06-24
769780157|152914472|2017-01-02|4|NT|F|105888|LE|VAUXHALL|ZAFIRA|SILVER|PE|1598|2002-04-18
1746216493|1470131416|2017-01-02|4|NT|PRS|136357|GL|VOLKSWAGEN|PASSAT|SILVER|DI|1896|2003-10-02
466761913|327911534|2017-01-02|4|NT|P|192214|B|BMW|730|SILVER|DI|2993|2006-05-26
803470959|147573090|2017-01-02|4|NT|P|68137|BD|MERCEDES-BENZ|CLK|SILVER|PE|3199|2005-03-17

==> test_result_2017_1876.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
35798211|1147988404|2017-01-02|4|RT|P|76611|BD|TOYOTA|STARLET|WHITE|PE|1332|1998-08-02
372507257|972820804|2017-01-02|4|RT|P|80419|PE|VAUXHALL|ASTRA|BLACK|PE|1598|2010-01-28
453325387|1305992244|2017-01-02|4|NT|P|61567|G|AUDI|A4|WHITE|DI|1968|2012-01-27
917979891|1194827588|2017-01-02|4|NT|P|65888|HA|MAZDA|5|SILVER|PE|1999|2009-12-17
1126743479|131080144|2017-01-02|4|NT|P|105119|CM|BMW|318|SILVER|PE|1995|2006-12-22
1079616151|590584028|2017-01-02|4|NT|P|65488|BN|FORD|FIESTA|BLACK|PE|1388|2011-06-09
946546149|1267220428|2017-01-04|4|NT|P|76300|NP|FORD|TRANSIT|SILVER|DI|1998|2010-01-13
776597907|1075609506|2017-01-02|4|NT|F|158994|TW|HONDA|CR-V|SILVER|PE|1998|2005-06-02
1113306953|121477085|2017-01-02|4|RT|P|17545|SO|FORD|FIESTA|GREY|PE|998|2013-12-30

==> test_result_2017_1879.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
1699089165|1286486204|2017-01-02|4|RT|P|98840|CB|FORD|MONDEO|BLACK|DI|2198|2006-06-23
1571143707|13617278|2017-01-02|4|NT|P|118231|M|RENAULT|MEGANE|BLUE|DI|1461|2004-09-08
163743669|9904378|2017-01-02|4|RT|P|38066|NW|RENAULT|CLIO|BLUE|PE|1390|2007-07-30
500452715|1397487294|2017-01-02|4|RT|P|26616|SO|FORD|FIESTA|WHITE|PE|1388|2009-12-16
709216303|918715788|2017-01-02|4|NT|P|96050|WS|VOLKSWAGEN|GOLF|GREY|PE|1390|2007-05-04
1045925349|1337436588|2017-01-02|4|NT|P|98042|PO|FORD|GRAND C-MAX|GREY|DI|1560|2011-01-31
1254688937|1410604904|2017-01-02|4|NT|PRS|67347|PR|CITROEN|DS3|GREY|DI|1560|2011-09-28
1335507067|875005856|2017-01-02|4|NT|P|70755|B|HONDA|CIVIC|GREY|PE|1339|2006-06-13
951670693|962559766|2017-01-02|4|NT|P|62915|HA|BMW|535|BLACK|DI|2993|2011-03-18

3.1GB and 38 million rows and heads for files looks look. We’re back to | as the delimiter now.

2016 #

From 2016 and before gzip was used for file compression…

simon@NUC:~/Documents/mot_data$ gunzip test_result_2016.txt.gz 
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2016.txt 
-rw-rw-r-- 1 simon simon 3.1G Jan 14 15:46 test_result_2016.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2016.txt 
37693381 test_result_2016.txt
simon@NUC:~/Documents/mot_data$ head test_result_2016.txt 
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
1645480751|1374211238|2016-01-01|4|NT|P|117033|SM|VOLKSWAGEN|POLO|BLACK|PE|1600|2000-06-23
1393462389|1153769898|2016-01-01|4|NT|P|99292|NE|VOLKSWAGEN|PASSAT|BLUE|DI|1968|2006-11-30
1863202023|1485039300|2016-01-01|7|NT|PRS|170320|E|MERCEDES|SPRINTER 313 CDI LWB|WHITE|DI|2148|2005-01-14
1304292863|1097073904|2016-01-01|4|RT|P|70623|NN|MINI|MINI|GREY|PE|1598|2004-04-08
845810407|1166548800|2016-01-01|4|NT|P|21567|DL|NISSAN|JUKE|BLUE|PE|1612|2011-07-07
1474886807|1397571962|2016-01-01|4|NT|P|62207|CR|SUZUKI|WAGON-R+|SILVER|PE|1328|2005-02-09
1560183779|798718734|2016-01-01|4|NT|P|44855|PE|FIAT|PUNTO ACTIVE 8V|BLUE|PE|1242|2005-12-19
1314962575|631812270|2016-01-01|4|NT|P|163692|NE|RENAULT|TRAFIC|BLUE|DI|1870|2003-03-28
1517535293|244951922|2016-01-01|4|NT|P|41057|BS|DAEWOO|MATIZ|SILVER|PE|796|2003-09-01

Single file of 3.1G with 37.6 million rows and the head of the file looks good.

2015 #

simon@NUC:~/Documents/mot_data$ gunzip test_result_2015.txt.gz 
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2015.txt 
-rw-rw-r-- 1 simon simon 3.1G Jan 14 15:46 test_result_2015.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2015.txt 
37490737 test_result_2015.txt
simon@NUC:~/Documents/mot_data$ head test_result_2015.txt 
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
1380469872|1023201026|2015-01-01|1|NT|ABR||OX|HONDA|XL125V|SILVER|PE|125|2003-06-06
1844368232|170614898|2015-01-01|1|NT|ABR||WD|AJS|DD 125 E 08|BLACK|PE|125|2009-12-21
938902988|980532916|2015-01-01|4|NT|ABR||KT|SUBARU|IMPREZA|SILVER|PE|2457|2006-05-22
1989308134|1258278786|2015-01-01|4|RT|P|107604|SN|LAND ROVER|DISCOVERY|GREEN|DI|2720|2005-01-18
1100975468|343484998|2015-01-01|4|RT|P|79858|CV|RENAULT|SCENIC|BLUE|DI|1461|2004-09-17
511507734|636234930|2015-01-01|4|NT|P|91006|CM|ALFA ROMEO|GT|SILVER|PE|1970|2004-12-09
1625020172|1051800982|2015-01-01|4|NT|P|109353|DH|ISUZU|RODEO|BLACK|DI|2999|2003-12-02
18614018|1429705738|2015-01-01|4|NT|P|109074|BB|VAUXHALL|ZAFIRA|BLACK|DI|1995|2004-07-19
1872117038|1115925744|2015-01-01|4|RT|P|154396|CB|BMW|X5|SILVER|DI|2926|2002-09-19

Very similar stats to 2016, 3.1GB 37.5 million rows and file contents looks good.

2014 #

simon@NUC:~/Documents/mot_data$ gunzip test_result_2014.txt.gz 
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2014.txt 
-rw-rw-r-- 1 simon simon 3.1G Jan 14 15:46 test_result_2014.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2014.txt 
37493826 test_result_2014.txt
simon@NUC:~/Documents/mot_data$ head test_result_2014.txt 
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
75849048|98956394|2014-01-01|4|RT|P|71002|BD|FORD|FOCUS|SILVER|PE|1596|2003-12-01
1893007630|479820560|2014-01-01|4|RT|P|64836|HU|VAUXHALL|CORSA|GREEN|PE|973|1998-12-31
1786152630|659469072|2014-01-01|4|NT|P|143739|SL|LAND ROVER|DISCOVERY|BLUE|DI|2495|1996-08-06
888025356|951392102|2014-01-01|4|NT|F|62949|LE|DAEWOO|MATIZ|GREEN|PE|796|2002-07-22
159215682|1230516026|2014-01-01|4|NT|P|94954|UB|BMW|318|GREY|PE|1995|2006-07-01
640798820|759806792|2014-01-01|4|NT|P|82823|SL|VOLKSWAGEN|POLO|GREEN|PE|1390|1999-05-05
866472130|416537576|2014-01-01|4|RT|P|100789|BB|FORD|FOCUS|SILVER|PE|1796|2001-05-01
548721000|480095260|2014-01-01|4|NT|F|94968|GL|ISUZU|TF|GREEN|DI|2499|2003-07-16
1985242428|1364748290|2014-01-01|4|NT|P|43526|WS|VAUXHALL|ASTRA|BLACK|PE|1796|2010-01-29

3.1GB with 37.5 million rows and file contents looking good.

2013 #

simon@NUC:~/Documents/mot_data$ gunzip test_result_2013.txt.gz 
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2013.txt 
-rw-rw-r-- 1 simon simon 3.1G Jan 14 15:46 test_result_2013.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2013.txt 
37361926 test_result_2013.txt
simon@NUC:~/Documents/mot_data$ head test_result_2013.txt 
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
608789348|738799388|2013-01-01|4|NT|F|104217|WF|HYUNDAI|GETZ|SILVER|DI|1493|2006-08-17
1243374888|1284015018|2013-01-01|4|NT|PRS|48807|WF|FORD|FOCUS|BLUE|PE|1596|2008-11-26
1382967316|1025199626|2013-01-01|4|RT|P|85159|S|FORD|TRANSIT CONNECT|WHITE|DI|1753|2006-01-26
566374210|159460880|2013-01-01|4|NT|P|138483|WF|VAUXHALL|MOVANO|WHITE|DI|2463|2007-12-17
496179986|366607340|2013-01-01|4|NT|F|76427|B|SEAT|AROSA|SILVER|PE|998|2002-11-07
502846824|1359633208|2013-01-01|4|NT|F|100982|B|VAUXHALL|ASTRA|BLACK|PE|1598|2005-03-01
1199477564|1184417722|2013-01-01|4|NT|F|76525|NP|FORD|FOCUS|BLUE|DI|1753|2003-10-10
1794049122|1125863740|2013-01-01|4|NT|P|127122|BB|CHRYSLER|VOYAGER|PURPLE|DI|2776|1999-09-13
1489198368|45724212|2013-01-01|4|NT|P|45651|HX|TOYOTA|COROLLA VERSO|RED|PE|1794|2006-01-16

3.1GB with 37.3 million rows and file contents looking good.

2012 #

simon@NUC:~/Documents/mot_data$ gunzip test_result_2012.txt.gz 
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2012.txt 
-rw-rw-r-- 1 simon simon 3.0G Jan 14 15:46 test_result_2012.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2012.txt 
36846343 test_result_2012.txt
simon@NUC:~/Documents/mot_data$ head test_result_2012.txt 
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
453094722|1336915420|2012-01-01|4|NT|F|89540|S|DAEWOO|MATIZ|GREEN|PE|796|2002-03-22
1987609170|496266720|2012-01-01|4|RT|P|16380|S|PEUGEOT|306|YELLOW|DI|1905|1999-08-09
903083256|118415214|2012-01-01|4|NT|P|96341|WF|VAUXHALL|MOVANO|WHITE|DI|2464|2008-01-03
1112913888|763268186|2012-01-01|4|NT|P|48571|WF|VAUXHALL|ASTRA|WHITE|DI|1248|2008-12-14
1559139028|1432194588|2012-01-01|4|RT|P|111319|PE|AUDI|A2|BLUE|PE|1390|2001-11-16
782610276|955517208|2012-01-01|4|NT|PRS|75455|HU|NISSAN|MICRA|WHITE|PE|998|1995-01-31
1754413996|455754094|2012-01-01|4|NT|F|56120|WF|VAUXHALL|ASTRA|WHITE|DI|1248|2008-12-14
1067194566|1071396330|2012-01-01|4|RT|P|151189|SN|SUZUKI|VITARA|BLACK|DI|1590|1997-08-01
1941098522|551536520|2012-01-01|4|NT|P|79042|CW|PEUGEOT|307|GREY|DI|1560|2006-09-26

3.0GB with 36.8 million rows and file contents looking good.

2011 #

simon@NUC:~/Documents/mot_data$ gunzip test_result_2011.txt.gz 
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2011.txt 
-rw-rw-r-- 1 simon simon 3.0G Jan 14 15:46 test_result_2011.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2011.txt 
36849155 test_result_2011.txt
simon@NUC:~/Documents/mot_data$ head test_result_2011.txt 
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
539690916|244877620|2011-01-01|4|NT|P|54127|NR|VOLKSWAGEN|POLO|SILVER|PE|1390|2003-01-03
201563620|1094583990|2011-01-01|4|NT|PRS|63997|NR|FORD|FOCUS|BLACK|PE|1988|2002-07-22
1208708976|135050588|2011-01-01|4|NT|F|83946|SA|FORD|FIESTA|BLUE|PE|1299|1998-12-31
1898167378|507467582|2011-01-01|4|NT|P|45917|NP|MG|B GT|WHITE|PE|1798|1971-07-01
367395770|1280070280|2011-01-01|4|NT|P|229153|LE|BMW|525|GREEN|DI|2498|1996-01-30
732978810|936792652|2011-01-01|4|RT|P|122252|SS|AUDI|A3|BLACK|DI|1968|2004-10-25
379165572|302739506|2011-01-01|4|NT|P|156126|WR|VOLVO|V40|SILVER|DI|1870|2002-03-28
1532977216|1490017506|2011-01-01|4|RT|P|58621|RG|LAND ROVER|DEFENDER 110|GREEN|DI|2495|1995-08-01
949214096|63021908|2011-01-01|4|NT|P|84862|N|MERCEDES|C 180K|BLUE|PE|1796|2003-12-22

3.0GB with 36.8 million rows and data looking good.

2010 #

simon@NUC:~/Documents/mot_data$ gunzip test_result_2010.txt.gz 
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2010.txt 
-rw-rw-r-- 1 simon simon 3.0G Jan 14 15:46 test_result_2010.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2010.txt 
36134921 test_result_2010.txt
simon@NUC:~/Documents/mot_data$ head test_result_2010.txt 
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
806196694|188581750|2010-01-01|7|NT|ABR||CH|FORD|TRANSIT|WHITE|DI|2402|2003-01-15
1471475604|753317992|2010-01-01|4|NT|F|84722|BD|VAUXHALL|ZAFIRA|BLACK|PE|1796|2001-10-18
1392494248|272470820|2010-01-01|4|NT|P|10114|DE|FIAT|DUCATO|WHITE|DI|2800|2000-11-17
1735013826|1366770062|2010-01-01|4|NT|PRS|78368|BD|TOYOTA|COROLLA|WHITE|PE|1332|1995-08-31
1629073590|49902474|2010-01-01|4|NT|PRS|51000|BL|PEUGEOT|307|SILVER|PE|1360|2003-07-17
767336288|673773234|2010-01-01|4|NT|P|153356|BD|PEUGEOT|307|BLUE|DI|1997|2003-12-06
1873135550|257169262|2010-01-01|4|RT|P|79294|BD|FORD|FOCUS|BLUE|PE|1596|1999-11-24
1559183164|976959056|2010-01-01|4|NT|P|137263|WV|TOYOTA|LUCIDA|BLUE|DI|2184|1997-01-01
1890031062|967342532|2010-01-01|4|NT|F|133476|W|TOYOTA|ESTIMA 2WD AUTO|SILVER|DI|2180|1996-12-31

3.0GB with 36.1 million rows and file contents looks good.

2009 #

simon@NUC:~/Documents/mot_data$ gunzip test_result_2009.txt.gz 
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2009.txt 
-rw-rw-r-- 1 simon simon 2.9G Jan 14 15:45 test_result_2009.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2009.txt 
35436944 test_result_2009.txt
simon@NUC:~/Documents/mot_data$ head test_result_2009.txt 
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
1379380200|1336022860|2009-01-01|4|NT|ABR||DE|MERCEDES|C 250|BLUE|DI|2497|1997-02-26
717802790|304757246|2009-01-01|4|NT|P|49013|BD|ROVER|25 IMPRESSION S|BLUE|PE|1396|2001-03-31
1792798112|466136480|2009-01-01|4|NT|P|196306|DE|RENAULT|UNCLASSIFIED|WHITE|DI|2463|2004-04-13
1082280248|1267482366|2009-01-01|4|NT|F|111686|BB|ROVER|416|RED|PE|1589|1999-12-31
1366813380|64410314|2009-01-01|4|RT|P|100460|IP|RENAULT|MEGANE SCENIC|RED|PE|1598|1998-05-27
1275514134|1452931758|2009-01-01|4|NT|P|88176|CH|RENAULT|MEGANE|BLUE|DI|1461|2005-01-05
1461097168|343630626|2009-01-01|4|RT|P|76606|HU|AUDI|TT|SILVER|PE|1781|2002-05-13
295185618|149270444|2009-01-01|4|NT|P|56610|W|VAUXHALL|ASTRA|GREY|PE|1389|2002-09-04
1761887058|704695744|2009-01-01|4|RT|P|68062|GU|AUDI|A2|GREY|DI|1422|2002-01-14

2.9GB with 35.4 million rows and file contents looks good.

2008 #

simon@NUC:~/Documents/mot_data$ gunzip test_result_2008.txt.gz 
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2008.txt 
-rw-rw-r-- 1 simon simon 2.8G Jan 14 15:45 test_result_2008.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2008.txt 
34439133 test_result_2008.txt
simon@NUC:~/Documents/mot_data$ head test_result_2008.txt 
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
317588902|124592814|2008-01-01|4|NT|P|189567|NE|RENAULT|ESPACE|BLUE|PE|1995|1993-01-01
1385282064|1165558462|2008-01-01|4|NT|F|116718|BB|TOYOTA|CARINA E|BLUE|PE|1587|1994-04-26
1425711724|587883270|2008-01-01|4|RT|P|101904|BB|NISSAN|ALMERA|GREEN|PE|1392|1997-12-10
594025786|687800162|2008-01-01|4|NT|ABA||B|HONDA|CIVIC|BLUE|PE|1590|2005-02-24
1741336990|641514360|2008-01-01|2|NT|P|31926|TA|MATCHLESS|G3LS|BLACK|PE|347|1958-07-08
1377218714|1203478532|2008-01-01|4|NT|P|39225|BD|PEUGEOT|306|SILVER|PE|1761|2001-03-01
681413512|1296120616|2008-01-01|4|RT|P|135651|HX|MITSUBISHI|SHOGUN|SILVER|DI|2835|1992-10-09
1771273408|399202526|2008-01-01|4|NT|PRS|112790|BS|RENAULT|CLIO|RED|PE|1390|1996-07-23
681215928|1414798484|2008-01-01|4|NT|F|75014|BB|HONDA|CIVIC|BLUE|PE|1396|1998-02-02

2.8GB with 34.4 million rows and file contents looks good.

2007 #

simon@NUC:~/Documents/mot_data$ gunzip test_result_2007.txt.gz 
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2007.txt 
-rw-rw-r-- 1 simon simon 2.8G Jan 14 15:45 test_result_2007.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2007.txt 
33591239 test_result_2007.txt
simon@NUC:~/Documents/mot_data$ head test_result_2007.txt 
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
808298134|151699072|2007-01-01|4|NT|ABR||SK|FORD|MAVERICK|GREEN|PE|1988|2002-05-02
842444180|1291028996|2007-01-01|4|RT|P|97109|HU|VAUXHALL|ASTRA|WHITE|DI|1700|1999-03-31
348649550|174602976|2007-01-01|4|NT|PRS|28389|M|VAUXHALL|CAVALIER L|GOLD|PE|1598|1992-01-24
1828509444|369109734|2007-01-01|4|NT|P|82088|E|RENAULT|LAGUNA|SILVER|PE|1998|1996-03-29
1697977864|800848976|2007-01-01|4|NT|F|96285|EX|VAUXHALL|VECTRA|SILVER|PE|1799|1999-03-29
966004542|378981876|2007-01-01|4|RT|P|110393|BD|FORD|GALAXY ZETEC TDI|SILVER|DI|1896|2001-12-10
1307568758|1479243936|2007-01-01|4|NT|P|113044|WF|VAUXHALL|ASTRA|BLUE|PE|1389|1993-04-06
1988447032|1223225444|2007-01-01|4|NT|P|101674|BB|HONDA|ACCORD|GREEN|PE|1850|1998-11-24
1123460628|429612316|2007-01-01|4|NT|PRS|78053|BB|HYUNDAI|ACCENT|WHITE|PE|1341|2000-01-15

2.8GB and 33.6 million rows with file content looking good.

2006 #

simon@NUC:~/Documents/mot_data$ gunzip test_result_2006.txt.gz 
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2006.txt 
-rw-rw-r-- 1 simon simon 2.6G Jan 14 15:44 test_result_2006.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2006.txt 
32014081 test_result_2006.txt
simon@NUC:~/Documents/mot_data$ head test_result_2006.txt 
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
1949156228|816235882|2006-01-01|4|NT|P|54101|SA|AUSTIN|MINI MAYFAIR|GREEN|PE|998|1985-05-29
1231051602|1038868250|2006-01-01|4|NT|P|134032|CW|MITSUBISHI|SHOGUN|WHITE|DI|2477|1989-12-31
1976279366|416238784|2006-01-01|0|NT|P|100087|BD|VAUXHALL|UNCLASSIFIED|BLUE|PE||
737220590|967257220|2006-01-01|4|NT|P|95802|SE|FORD|ESCORT|RED|PE|1597|1999-04-30
1564944812|360370778|2006-01-01|4|NT|P|18325|PO|KIA|CARENS|BLUE|PE|1793|2000-09-27
1102734306|270469542|2006-01-01|4|NT|P|27325|NE|RENAULT|CLIO|GREEN|PE|1149|2001-02-20
655291568|957107084|2006-01-01|4|RT|P||HR|CITROEN|SAXO FORTE|ORANGE|PE|1124|2000-04-27
1252564280|1020366628|2006-01-01|4|RT|P|15621|E|NISSAN|MICRA GX AUTO|WHITE|PE|1275|1997-05-29
1232280538|46559388|2006-01-01|4|NT|P|51937|E|NISSAN|UNCLASSIFIED|GREEN|PE|1392|1997-12-11

2.6GB with 32 million rows and head of the file looks good.

2005 #

simon@NUC:~/Documents/mot_data$ gunzip test_result_2005.txt.gz 
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2005.txt 
-rw-rw-r-- 1 simon simon 621M Jan 14 15:29 test_result_2005.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2005.txt 
7499745 test_result_2005.txt
simon@NUC:~/Documents/mot_data$ head test_result_2005.txt 
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
804664368|256274986|2005-01-01|0|NT|P|23459|TF|FORD|UNCLASSIFIED|SILVER|PE||
392603376|633988704|2005-01-01|0|NT|P|40961|E|LOTUS|UNCLASSIFIED|RED|PE||
1894843206|1320781748|2005-01-01|0|NT|P|16416|S|VAUXHALL|UNCLASSIFIED|BLUE|PE||
830908928|1263031090|2005-01-01|4|NT|P|93318|W|LAND ROVER|109 V8 S.W.|BLUE|PE|3528|1981-04-06
727535460|1123257842|2005-01-01|4|NT|P|121930|RG|CITROEN|AX|WHITE|DI|1360|1993-08-31
207507680|1168225356|2005-01-01|0|NT|P|122296|FK|CHRYSLER|UNCLASSIFIED|BLACK|DI||
932135720|215535474|2005-01-01|4|NT|P|74823|DG|VAUXHALL|VECTRA|BLUE|PE|1598|1996-10-21
1932156144|1100578334|2005-01-01|4|NT|P|63133|SY|VAUXHALL|CORSA|GREY|PE|1389|1997-03-28
1416289564|1239943850|2005-01-01|4|NT|P|73256|SN|FORD|FIESTA|SILVER|PE|1119|1994-05-25

621MB with 7.4 million rows.

The low number of rows in 2005 is detailed in the accompanying guide to teh data stating…

Computerisation was not fully implemented across Great Britain until 01/04/2006, therefore the dataset will not contain all tests performed between 01/01/2005 and 31/03/2006 The data encompasses all tests for which a valid MOT pass could have been a potential outcome.

Next Step #

This data set all looks good. Next step is to load into a DuckDB database. Check out the next post in this project for this.